智能论文笔记

Language statistics at different spatial, temporal, and grammatical scales

Fernanda Sánchez-Puig , Rogelio Lozano-Aranda , Dante Pérez-Méndez , Ewan Colman , Alfredo J. Morales-Guzmán , Carlos Pineda , Carlos Gershenson

分类：自然语言处理

2022-07-02

近几十年来，随着数据的可用，统计语言学已大大提高。这使研究人员能够研究语言的统计特性如何随时间变化。在这项工作中，我们使用来自Twitter的数据来探索英语和西班牙语，考虑到不同尺度的排名多样性：时间（从3到96小时），空间（从3公里到3000+km Radii）和语法（从字母组到五角形到Pentagrams））。我们发现所有三个量表都是相关的。但是，最大的变化来自语法量表的变化。在最低的语法量表（会标）上，排名多样性曲线最相似，独立于其他量表，语言和国家的价值。随着语法量表的增长，等级多样性曲线的变化更大，具体取决于时间和空间量表以及语言和国家。我们还研究了Twitter特定令牌的统计数据：表情符号，主题标签和用户提及。这些特殊类型的令牌表现出一种sigmoid的行为作为等级多样性函数。我们的结果有助于量化似乎普遍存在的语言统计数据的各个方面，这可能导致变化。

translated by 谷歌翻译

ForestEyes Project: Conception, Enhancements, and Challenges

Fernanda B. J. R. Dallaqua , Álvaro Luiz Fazenda , Fabio A. Faria

分类：计算机视觉

2022-08-24

雨林在全球生态系统中起着重要作用。但是，由于几个原因，它们的重要区域正面临森林砍伐和退化。创建了各种政府和私人计划，以监视和警报遥感图像增加森林砍伐的增加，并使用不同的方式处理显着的生成数据。公民科学项目也可以用于实现相同的目标。公民科学由涉及非专业志愿者进行分析，收集数据和使用其计算资源的科学研究组成，并在科学方面取得进步，并提高公众对特定知识领域的问题的理解，例如天文学，化学，数学和物理学。从这个意义上讲，这项工作提出了一个名为Foresteyes的公民科学项目，该项目通过对遥感图像的分析和分类来使用志愿者的答案来监视雨林中的森林砍伐区域。为了评估这些答案的质量，使用来自巴西法律亚马逊的遥感图像启动了不同的活动/工作流程，并将其结果与亚马逊森林砍伐监测项目生产的官方地面图进行了比较。在这项工作中，在2013年和2016年围绕着Rond \^onia州的前两个工作流程收到了35,000美元以上的$ 383 $志愿者的答复，$ 2,050 $ 2,050 $在发布后仅两周半就创建了任务。对于其他四个工作流程，甚至封闭了同一区域（Rond \^onia）和不同的设置（例如，图像分割方法，图像分辨率和检测目标），他们收到了$ 51,035美元的志愿者的答案，从$ 281的志愿者收取的$ 3,358 $ $ 3,358 $任务。在执行的实验中...

translated by 谷歌翻译

HTML版本

Neuroevolution-based Classifiers for Deforestation Detection in Tropical Forests

Guilherme A. Pimenta , Fernanda B. J. R. Dallaqua , Alvaro Fazenda , Fabio A. Faria

分类：计算机视觉

2022-08-23

热带森林代表了地球上许多物种的动植物的家园，保留了数十亿吨的碳足迹，促进云层和雨水形成，这意味着在全球生态系统中起着至关重要的作用，除了代表无数土著人民的家中。不幸的是，由于森林砍伐或退化，每年丧失数百万公顷的热带森林。为了减轻这一事实，除了预防和惩罚罪犯的公共政策外，还使用了监视和森林砍伐检测计划。这些监视/检测程序通常使用遥感图像，图像处理技术，机器学习方法和专家照片解释来分析，识别和量化森林覆盖的可能变化。几个项目提出了不同的计算方法，工具和模型，以有效地识别最近的森林砍伐区域，从而改善了热带森林中的森林砍伐监测计划。从这个意义上讲，本文提出了基于神经进化技术（整洁）的模式分类器在热带森林森林砍伐检测任务中的使用。此外，已经创建并获得了一个名为E-Neat的新颖框架，并实现了超过$ 90 \％$的分类结果，用于在目标应用中使用极为降低和有限的训练集用于学习分类模型。这些结果代表了本文比较的最佳基线合奏方法的相对增益$ 6.2 \％$

translated by 谷歌翻译

A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text

Carlo Lipizzi , Dario Borrelli , Fernanda de Oliveira Capela

分类：自然语言处理 | 机器学习 | (统计)机器学习

2020-05-12

这项工作介绍了一种新方法，以考虑文本分析中的主观性和一般上下文依赖性，并用作示例检测文本中传达的情绪。所提出的方法通过Marvin Minsky（1974）利用Mikolov等人的文本向量化的框架理论的计算版本来考虑主观性。（2013），用于基于它们出现的上下文生成单词的分布式表示。我们的方法是基于三个组成部分：1。代表观点的框架/“房间”; 2.代表分析标准的基准 - 在这种情况下，情绪分类，从罗伯特·普特金（1980）的人类情绪研究; 3.要分析的文件。通过使用单词之间的相似性测量，我们能够在我们的案例研究中提取基准中的元素中的元素的相对相关性 - 对于要分析的文件。我们的方法提供了一种措施，考虑到读取文档的实体的角度。该方法可以应用于评估主体性与理解文本的相对值或含义相关的所有情况。主观性可以不限于人体反应，但它可用于提供具有与给定域（“房间”）相关的解释的文本。为了评估我们的方法，我们在政治领域中使用了测试案例。

translated by 谷歌翻译

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim , Martin Wattenberg , Justin Gilmer , Carrie Cai , James Wexler , Fernanda Viegas , Rory Sayres

分类：

2017-11-30

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result-for example, how sensitive a prediction of zebra is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

translated by 谷歌翻译

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson , Mike Schuster , Quoc V. Le , Maxim Krikun , Yonghui Wu , Zhifeng Chen , Nikhil Thorat , Fernanda Viégas , Martin Wattenberg , Greg Corrado

分类：

2016-11-14

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes an encoder, decoder and attention module, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT'14 and WMT'15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

translated by 谷歌翻译